Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques

Identifieur interne : 000028 ( Main/Exploration ); précédent : 000027; suivant : 000029

La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques

Auteurs : Mathieu Andro [France] ; Imad Saleh [France]

Source :

RBID : Hal:hal-01164263

Descripteurs français

Abstract

For their digitization projects, libraries produce often OCR with errors which can be corrected by providers employing low cost labor. But libraries May also appeal to web volunteers (explicit crowdsourcing) or to a paid crowd (like Amazon Mechanical Turk marketplace) or to users correcting OCR by playing games (gamification) or to internet users who don’t know that they are correcting OCR (implicit crowdsourcing like reCAPTCHA). Profitability of these experiments is compared.

Url:


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques</title>
<author>
<name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-92981" status="VALID">
<orgName>INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-92114" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-92114" type="direct">
<org type="institution" xml:id="struct-92114" status="VALID">
<orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-39850" status="VALID">
<orgName>Laboratoire Paragraphe</orgName>
<desc>
<address>
<addrLine>Département Hypermédia - 2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://paragraphe.info/</ref>
</desc>
<listRelation>
<relation name="EA349" active="#struct-11141" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA349" active="#struct-11141" type="direct">
<org type="institution" xml:id="struct-11141" status="VALID">
<orgName>Université Paris 8, Vincennes-Saint-Denis</orgName>
<orgName type="acronym">UP8</orgName>
<desc>
<address>
<addrLine>2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris8.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01164263</idno>
<idno type="halId">hal-01164263</idno>
<idno type="halUri">https://hal.archives-ouvertes.fr/hal-01164263</idno>
<idno type="url">https://hal.archives-ouvertes.fr/hal-01164263</idno>
<date when="2015">2015</date>
<idno type="wicri:Area/Hal/Corpus">000072</idno>
<idno type="wicri:Area/Hal/Curation">000072</idno>
<idno type="wicri:Area/Hal/Checkpoint">000018</idno>
<idno type="wicri:doubleKey">0006-2006:2015:Andro M:la:correction:participative</idno>
<idno type="wicri:Area/Main/Merge">000058</idno>
<idno type="wicri:Area/Main/Curation">000028</idno>
<idno type="wicri:Area/Main/Exploration">000028</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques</title>
<author>
<name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-92981" status="VALID">
<orgName>INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
</desc>
<listRelation>
<relation active="#struct-92114" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle active="#struct-92114" type="direct">
<org type="institution" xml:id="struct-92114" status="VALID">
<orgName>Institut National de la Recherche Agronomique</orgName>
<orgName type="acronym">INRA</orgName>
<desc>
<address>
<country key="FR"></country>
</address>
<ref type="url">http://www.inra.fr</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
<author>
<name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
<affiliation wicri:level="1">
<hal:affiliation type="laboratory" xml:id="struct-39850" status="VALID">
<orgName>Laboratoire Paragraphe</orgName>
<desc>
<address>
<addrLine>Département Hypermédia - 2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://paragraphe.info/</ref>
</desc>
<listRelation>
<relation name="EA349" active="#struct-11141" type="direct"></relation>
</listRelation>
<tutelles>
<tutelle name="EA349" active="#struct-11141" type="direct">
<org type="institution" xml:id="struct-11141" status="VALID">
<orgName>Université Paris 8, Vincennes-Saint-Denis</orgName>
<orgName type="acronym">UP8</orgName>
<desc>
<address>
<addrLine>2 rue de la Liberté - 93526 Saint-Denis cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-paris8.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Bulletin des bibliothèques de France</title>
<idno type="ISSN">0006-2006</idno>
<imprint>
<date type="datePub">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="mix" xml:lang="fr">
<term>Correction participative de l'OCR</term>
<term>Crowdsourcing</term>
<term>Numérisation</term>
<term>OCR</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">For their digitization projects, libraries produce often OCR with errors which can be corrected by providers employing low cost labor. But libraries May also appeal to web volunteers (explicit crowdsourcing) or to a paid crowd (like Amazon Mechanical Turk marketplace) or to users correcting OCR by playing games (gamification) or to internet users who don’t know that they are correcting OCR (implicit crowdsourcing like reCAPTCHA). Profitability of these experiments is compared.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
</list>
<tree>
<country name="France">
<noRegion>
<name sortKey="Andro, Mathieu" sort="Andro, Mathieu" uniqKey="Andro M" first="Mathieu" last="Andro">Mathieu Andro</name>
</noRegion>
<name sortKey="Saleh, Imad" sort="Saleh, Imad" uniqKey="Saleh I" first="Imad" last="Saleh">Imad Saleh</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000028 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000028 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Hal:hal-01164263
   |texte=   La correction participative de l’OCR par crowdsourcing au profit des bibliothèques numériques
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024